Evaluating Machine Learning Methods for Online Game Traffic Identification
نویسندگان
چکیده
Online gaming is becoming more and more prominent in the Internet, in terms of both traffic volume and as a potential source of revenue. Quality of Service (QoS) requirements for highly interactive games are much stricter than for traditional Internet applications, such as web or email. For effective QoS implementations that are transparent to users and game applications, an accurate and reliable method of classifying game traffic flows in the network must be found. Current methods such as port number and payload-based identification exhibit a number of shortfalls. A potential solution is the use of Machine Learning techniques to identify game traffic based on payload independent statistical features such as packet length distributions. In this paper we evaluate the effectiveness of the proposed approach. We compare the accuracy and performance of different Machine Learning techniques and we also use feature selection techniques to examine which features are most important in discriminating game traffic from other traffic. We find that machine learning algorithms are able to separate online game traffic from other network traffic with very high (>99%) accuracy. We also show that feature selection, while reducing accuracy, allows games to be identified with fewer features and substantial speed gains. Keywords—Game Traffic Classification, Machine Learning, Statistical Features I.INTRODUCTION The Internet is experiencing an increase in the use and commercialisation of interactive applications such as telephony and online gaming. Online gaming in particular is expected to become a large source of income, through either subscription-based games or dedicated gaming services. Internet Service Providers may also charge a premium for Quality of Service (QoS)-enhanced accounts targeted at gamers. Highly interactive online games, such as First Person Shooter (FPS) games, have a narrow tolerance to network issues such as delay, jitter and packet loss (see [1], [2]) necessitating more rigid QoS compared to the best effort service used for traditional Internet applications such as web or email. In order for QoS to be effective however, an accurate and timely method of identifying and classifying network gaming flows is required. As it is unlikely that game applications will ever explicitly signal their QoS demands to the network, the network must identify game flows and establish adequate QoS for these flows. Once highly interactive game traffic can be identified it can be given a higher priority over other traffic in the network. We presented the architecture and advantages of such a system in [3]. Current popular methods of classifying network applications include TCP/UDP port-based identification, and payload-based identification. The latter can be further divided into protocol decoding and signaturebased identification. With protocol decoding the classifier actually decodes the application protocol while signature-based methods search for application specific byte sequences in the payload. Port-based classification systems are moderately accurate at best and will become less effective in the near future. For example, a server hosting multiple games or instances of the same game might use an arbitrary port rather than the specified default port, making port-to-application mappings unpredictable. Payload-based classification relies on specific application data, making it difficult to detect a wide range of applications or stay up to date with new applications. In addition, the process of creating rules for signature-based classification must often be done by hand, which can be very time consuming. Machine learning (ML) techniques [5] provide a promising alternative through classifying flows based on application protocol (payload) independent statistical features. The features used in this study are flow characteristics such as packet length and inter-arrival times. This approach does not require packet payload and the classifier can be trained automatically assuming a representative training dataset can be obtained. A more general introduction to the problem is presented in [6] and [7]. We have previously used a wide range of machine learning algorithms to separate common network applications, such as web and mail traffic [24]. In this paper we apply several of the better performing algorithms to the task of separating network games from generic (i.e. common) network traffic. Although this is not the main focus we also investigate how effectively different games can be separated from each other. As the
منابع مشابه
Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification
Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...
متن کاملResearch on Online Game Traffic Classification Based on Machine Learning
This paper summarizes online game flow attributes by observing a great number of game data packets and computes their flow feature using Python programming language. Furthermore, we investigate several machine learning algorithms to classify five different online games automatically and correctly, that provide the average accuracy is over 80%. The test results show that machine learning has the...
متن کاملRealtime Encrypted Traffic Identification using Machine Learning
Accurate network traffic identification plays important roles in many areas such as traffic engineering, QoS and intrusion detection etc. The emergence of many new encrypted applications which use dynamic port numbers and masquerading techniques causes the most challenging problem in network traffic identification field. One of the challenging issues for existing traffic identification methods ...
متن کاملEvaluating Machine Learning Algorithms for Automated Network Application Identification
The identification of network applications that create traffic flows is vital to the areas of network management and surveillance. Current popular methods such as port number and payload-based identification are inadequate and exhibit a number of shortfalls. A potential solution is the use of machine learning techniques to identify network applications based on payload independent statistical f...
متن کاملEvaluating machine learning methods and satellite images to estimate combined climatic indices
The reflections recorded on satellite images have been affected by various environmental factors. In these images, some of these factors are combined with other environmental factors that cannot be distinguished. Therefore, it seems wise to model these environmental phenomena in the form of hybrid indicators. In this regard, satellite imagery and machine learning methods can play a unique role ...
متن کامل